{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "7d215e9d",
   "metadata": {},
   "source": [
    "# Deploying MultiModal Models with LMI Deep Learning Containers"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8d37f78b",
   "metadata": {},
   "source": [
    "In this tutorial, you will use the LMI DLC available on SageMaker to host and serve inference for a MultiModal model. We will be using the [Llava-v1.6](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf) model available on the HuggingFace Hub.\n",
    "\n",
    "Please make sure that you have an IAM role with SageMaker access enabled before proceeding with this example. \n",
    "\n",
    "For a list of supported multimodal models in LMI, please see the documentation [here]()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ead96092",
   "metadata": {},
   "source": [
    "## Step 1: Install Notebook Python Dependencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "13375a46",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install sagemaker --upgrade --quiet"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "82bff4b3",
   "metadata": {},
   "source": [
    "## Step 2: Leverage the SageMaker PythonSDK to deploy your endpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "de290482",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "from sagemaker.djl_inference import DJLModel\n",
    "\n",
    "role = sagemaker.get_execution_role() # iam role for the endpoint\n",
    "session = sagemaker.session.Session() # sagemaker session for interacting with aws APIs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f524fed5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Choose a specific version of LMI image directly:\n",
    "image_uri = \"763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.29.0-lmi11.0.0-cu124\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "07c5f812",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = DJLModel(\n",
    "    model_id=\"llava-hf/llava-v1.6-mistral-7b-hf\",\n",
    "    role=role,\n",
    "    image_uri=image_uri,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6f63dd41",
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor = model.deploy(initial_instance_count=1, instance_type=\"ml.g6.4xlarge\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ca76e2d",
   "metadata": {},
   "source": [
    "## Step 3: Make Inference Predictions"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3cddff26",
   "metadata": {},
   "source": [
    "For multimodal models, LMI containers support the [OpenAI Chat Completions Schema](https://platform.openai.com/docs/guides/chat-completions). You can find specific details about LMI's implementation of this spec [here](https://docs.djl.ai/docs/serving/serving/docs/lmi/user_guides/chat_input_output_schema.html).\n",
    "\n",
    "The OpenAI Chat Completions Schema allows two methods of specifying the image data:\n",
    "\n",
    "* an image url (e.g. https://resources.djl.ai/images/dog_bike_car.jpg)\n",
    "* base64 encoded string of the image data\n",
    "\n",
    "If an image url is provided, the container will make a network call to fetch the image. This is ok for small applications and experimentation, but is not recommended in a production setting. If you are in a network isolated environment you must use the base64 encoded string representation.\n",
    "\n",
    "We will demonstrate both mechanisms here."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31ffd634",
   "metadata": {},
   "source": [
    "### Getting a Test Image"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3a3e833e",
   "metadata": {},
   "source": [
    "You are free to use any image that you want. In this example, we'll be using the following image."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17a75789",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install Pillow"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4719217b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import urllib.request\n",
    "from PIL import Image\n",
    "\n",
    "image_url = \"https://resources.djl.ai/images/dog_bike_car.jpg\"\n",
    "image_path = \"dog_bike_car.jpg\"\n",
    "# download the image locally\n",
    "urllib.request.urlretrieve(image_url, image_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "60205c39",
   "metadata": {},
   "outputs": [],
   "source": [
    "img = Image.open('dog_bike_car.jpg')\n",
    "img.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3dc49c44",
   "metadata": {},
   "source": [
    "### Using the image http url directly"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9da8ded3",
   "metadata": {},
   "outputs": [],
   "source": [
    "messages = {\n",
    "    \"messages\": [\n",
    "        {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": [\n",
    "                {\n",
    "                    \"type\": \"text\",\n",
    "                    \"text\": \"What is this image of?\"\n",
    "                }, \n",
    "                {\n",
    "                    \"type\": \"image_url\",\n",
    "                    \"image_url\": {\n",
    "                        \"url\": image_url\n",
    "                    }\n",
    "                }\n",
    "            ]\n",
    "        }\n",
    "    ]\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c150722b",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = predictor.predict(messages)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c616bd96",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(response[\"choices\"][0][\"message\"][\"content\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6726896",
   "metadata": {},
   "source": [
    "## Using the base64 encoded image"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fd9cb2c5",
   "metadata": {},
   "outputs": [],
   "source": [
    "import base64\n",
    "\n",
    "def encode_image_base64(image_path):\n",
    "    with open(image_path, \"rb\") as image_file:\n",
    "        return base64.b64encode(image_file.read()).decode('utf-8')\n",
    "    \n",
    "encoded_image = encode_image_base64(image_path)\n",
    "base64_image_url = f\"data:image/jpeg;base64,{encoded_image}\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7dbbc253",
   "metadata": {},
   "outputs": [],
   "source": [
    "messages = {\n",
    "    \"messages\": [\n",
    "        {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": [\n",
    "                {\n",
    "                    \"type\": \"text\",\n",
    "                    \"text\": \"What is this image of?\"\n",
    "                }, \n",
    "                {\n",
    "                    \"type\": \"image_url\",\n",
    "                    \"image_url\": {\n",
    "                        \"url\": base64_image_url\n",
    "                    }\n",
    "                }\n",
    "            ]\n",
    "        }\n",
    "    ]\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4b4ba4c7",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = predictor.predict(messages)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "98dc9352",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(response[\"choices\"][0][\"message\"][\"content\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b428bc6d",
   "metadata": {},
   "source": [
    "## Clean Up Resources"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eccc5c7b",
   "metadata": {},
   "source": [
    "This example demonstrates how to use the LMI container to deploy MultiModal models and serve inference requests. The 0.29.0-lmi container supports a variety of multimodal models using the OpenAI Chat Completions API spec. In the future, we plan to increase the set of multimodal architectures supported, as well as provide additional API specs that can be used to make inference requests."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "75481491",
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor.delete_endpoint()\n",
    "model.delete_model()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d8fea0f2",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
